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ABSTRACT 

A commonsense understanding of the physical world will be cru¬ 
cial for the robots of the future as they strive to perform everyday 
activities and instructions formulated by human users in natural 
language. One mechanism that is believed to assist human cog¬ 
nition in commonsense reasoning is mental simulation, the envi¬ 
sioning of actions before they are performed. We therefore present 
a system integrating simulation of robot plans with probabilistic 
reasoning about natural-language instructions, to create a complete 
pipeline from instruction to execution to storing and analyzing re¬ 
sults of the simulation. This integration allows the robotic system 
to efficiently infer knowledge about the physical world that would 
be tedious to specify by hand in a collection of logical statements. 
Our system will be available online 1 for open use by researchers. 
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1. MOTIVATION AND OVERVIEW 

In order to interact with humans in everyday activities, robots 
must understand and appropriately respond to instructions formu¬ 
lated in human language. However, natural language instructions 
are often incomplete and rely on implicit, “commonsense” knowl¬ 
edge to interpret. Also, while the AI community has largely moved 
away from believing symbolic manipulation on its own is capable 
to produce meaning, the issue of where meaning could come from 
in robot cognition (what is often referred to as the grounding prob¬ 
lem) has no consensus solution. 

One approach to the grounding problem has been the simulation 
theory of cognition [1]. This has been put forth as a hypothesis for 
where human naive physics knowledge comes from, and has since 
been adopted by roboticists as a means to ground robot understand¬ 
ing: a robot can understand an action if it can simulate it, and based 
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Figure 1: A screenshot of our system, showing a plan call generated from 
a natural-language instruction and the simulation window. 


on the simulation, perform inferences about the action. For exam¬ 
ple, in [2] a simulation is used to retrieve information about the 
force required to pick up a rare egg without breaking it. In [3] 
a system is presented which learns suitable positions to put down 
an object through repeated simulations. Simulation techniques can 
also indicate whether an arrangement of objects is stable [4]. 

Motivated by the promise of the simulation theory of cogni¬ 
tion, we present a system (available online at http: / /prac. 
inf ormatik . uni-bremen . de/) that implements a complete 
pipeline, from parsing instructions given in natural language, to se¬ 
lecting a plan to execute them, to execution, and collection of data 
from the executed run. The execution of the instructions is per¬ 
formed in a simulated environment. We are also interfacing our 
system with IAI Bremen’s OPENEASE (http : / /open-ease . 
org/), so that data collected from simulation can be analyzed and 
queried by a human user as soon as a simulation completes. 

Our system aims to provide an environment in which to explore 
the applications of simulation to robot cognition: what knowledge 
can be extracted from simulation that is not found in the natural lan¬ 
guage instructions that triggered it, learning parametrizations for 
plans, inferring from experience rules of thumb about object be¬ 
havior (such as which containers can hold which objects), inferring 
from experience what simulation settings are particularly demand¬ 
ing of robot behavior robustness. Our system is openly available 
and we will continue adding features to the user interface, such as 
simulation parametrization and plan library selection. 
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Figure 2: Architecture overview of our system. For reasons of responsive¬ 
ness, several simulated worlds are running concurrently, waiting for a com¬ 
mand from the natural-language understanding component. The simulation 
produces logs of data about the robot’s processes, and a live visualization 
for the user. 


Figure 2 shows an overview of the architecture of our system. On 
the server side, we use our Prac system [5] to understand natural- 
language queries and interpret them into a format containing an 
action name and role-value pairs for the identified action parame¬ 
ters; some of these parameters can be extracted from the instruction 
itself, others are inferred probabilistically based on training corpora 
of what parameters tend to appear together with an action. 

The command is then sent to the simulation manager. For rea¬ 
sons of responsiveness (since it takes several seconds to start a sim¬ 
ulation), we maintain a pool of simulated worlds available. The 
manager will select a world appropriate for the command (some 
commands will be for simulations in a kitchen context, others in a 
chemical lab, and we plan to add more such worlds in the future), 
then send the command to the selected world, where the command 
is further interpreted by the simulated robot’s cognitive architec¬ 
ture, converted into an action plan, then executed. The various 
nodes of the simulated robot- its sensors, controllers, the cognitive 
component- are all implemented as ROS nodes. A live visualiza¬ 
tion of the simulation is provided via Robot Web Tools; it allows the 
user to see the simulated robot and objects, as well as any visual¬ 
ization markers created by the robot plan. Interaction with the sim¬ 
ulation through the visualization window is not yet supported. The 
robot inside the simulation is controlled by the Cognitive Robot 
Abstract Machine (CRAM) [6], which is a system for enabling the 
programming and execution of reactive, flexible, taskable robot be¬ 
haviors. 

Another important component is the logger, which records in¬ 
formation from the cognitive component of the simulated robot, as 
well as the simulator’s output with regard to object movement. The 
data from a simulation is stored on our server; we are connecting 
our system to IAI Bremen’s openEASE so as to allow users to 
visualize and query this data after a simulation completes. 

2. RELATED WORK 

The simulation theory of cognition puts forth the hypothesis that 
“thinking is simulated interaction with the environment” [7], and 
can be traced back to older ideas in philosophy (see [7]). More re¬ 
cently, it has been detailed and explored in neuroscience, for exam¬ 
ple in studying how something like mental simulation may accel¬ 
erate feedback for motor control [8], language understanding [9], 
physics knowledge and understanding other minds [1]. 

In the field of neuroscience, several pieces of research indicate 
that simulation may be a useful mental process for language under¬ 
standing [10, 9], physics reasoning and understanding other minds [1], 


and may be crucial for achieving expert performance [11]. Re¬ 
search has also shown that similar neural regions are active during 
performance of an activity in the real world, as well as just imag¬ 
ining the activity [12]. It is possible that humans use a “noisy” or 
approximate physics model [13], possibly non-Newtonian. Also, 
the quick acquisition of language and motor skills by very young 
children suggests they are learned in parallel and reinforce each 
other [14]. 

The simulation theory of cognition has also been applied to the 
research and development of artificial agents. Physics simulation 
was used to tackle a benchmark commonsense reasoning problem, 
the “Egg Cracking” problem [15], where it proved more scalable 
and easier to use than previous axiom-based approaches. Other 
uses have involved training controllers for walking [16] or cut¬ 
ting [17] in simulation before attempting real actions, and obtaining 
priors for body tracking [18]. 

There are also criticisms of the simulation theory of cognition [19, 
20]. In brief, the arguments are that the veridical, detailed repre¬ 
sentations simulations require are not compatible with the incom¬ 
plete knowledge an agent has about their world; simulations are 
inaccurate idealizations; they have limited usefulness beside pre¬ 
diction. Of course, we take the opposing view that simulation can 
handle the grounding and frame problems much easier than other 
approaches, that inaccuracy does not prevent simulation to produce 
good guesses, and that simulating several scenarios can give an 
agent the evidence to infer more general principles. 

Natural language interpretation by artificial agents is a robustly 
active research field. As a few examples, probabilistic methods 
have been explored to infer vague instruction parameters: Markov 
Logic Networks are used in [5], Conditional Random Fields by [21]. 
In [22], a system is presented that extracts machine-readable activ¬ 
ity knowledge from instructional web-sites aimed at humans. 

In terms of architectures for producing and managing robot be¬ 
havior, we mention several. ROSCo [23] allows human users to 
specify and share robot behaviors represented as hierarchical state 
machines. The Cognitive Robot Abstract Machine (CRAM) [6] de¬ 
fines a richer plan language to specify generic plans that are adapt¬ 
able, taskable, and with failure recovery; we use CRAM for our 
simulated robot’s cognitive component. The system of [24] inte¬ 
grates task and motion planning into a hierarchical behavior gener¬ 
ation system. This is developed further in [25], where the planning 
is performed in belief space. Of particular interest to this paper 
is the system presented in [21], which also combines natural lan¬ 
guage, simulation, and robot execution of actions. However, the 
focus of that research was for a robotic system to learn what actions 
are appropriate as a response to natural language queries, based on 
demonstrations from human users in simulated environments. In 
contrast, we are here interested in what knowledge a robotic agent 
can learn from simulating its own behavior. 
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